5 research outputs found

    Dynamic Dependency Collapsing

    Get PDF
    In this dissertation, we explore the concept of dynamic dependency collapsing. Performance increases in computer architecture are always introduced by exploiting additional parallelism when the clock speed is fixed. We show that further improvements are possible even when the available parallelism in programs are exhausted. This performance improvement is possible due to executing instructions in parallel that would ordinarily have been serialized. We call this concept dependency collapsing. We explore existing techniques that exploit parallelism and show which of them fall under the umbrella of dependency collapsing. We then introduce two dependency collapsing techniques of our own. The first technique collapses data dependencies by executing two normally dependent instructions together by fusing them. We show that exploiting the additional parallelism generated by collapsing these dependencies results in a performance increase. Our second technique collapses resource dependencies to execute instructions that would normally have been serialized due to resource constraints in the processor. We show that it is possible to take advantage of larger in-processor structures while avoiding the power and area penalty this often implies

    Complexity-effective rename table design for rapid speculation recovery]

    No full text
    Yazmaç yeniden adlandırma güncel çok yollu işlemcilerde gerçek olmayan veri bağımlılıklarını ortadan kaldırmak için sıklıkla kullanılan bir tekniktir. Bu teknik mimari tasarımda belirtilen yazmaçların işlemciye gelen buyrukların çözülmesi sırasında fiziksel yazmaçlara atanması ile gerçekleştirilir. Bu atamalar bir eşleştirme tablosunda tutulur. Çok yollu işlemciler dallanma tahmini gibi teknikler kullandığında işlemci hatalı bir tahmin sonucunda olmaması gereken bir duruma düşer. Yanlışlıkla işlenilmeye başlanan buyrukların yazmaçlarının yeniden adlandırmaları bir şekilde geri alınmalı ve doğru duruma dönülmelidir. Güncel işlemcilerde bu geri dönüşümü yapan teknikler ya geri dönüşüm hızından, ya da donanım karmaşıklığı yönünden taviz vermektedir. Bu çalışma donanım karmaşıklığı yönünden daha basit olan, bunun yanında en yavaş halinda iki saat vuruşunda yeniden adlandırma tablosunu eski haline getirebilen ve rahat genişletilebilen bir yeniden adlandırma sistemi önermektedir. Önerilen yapı her mimari yazmaç için farklı boylarda İGİÇ kuyrukları kullanarak her mimari yazmaç için farklı miktarda kopya tutmayı hedefliyor. Bu çalışmanın sonuçları bazı özel durumlar dışında önerilen sistemin donanımla sınırlı yapılardan başarımının daha iyi olduğunu gösteriyor. Bu çalışmanın yanında, işlemcide kullanılan alanı en aza indirmek için İGİÇ kuyruklarının boylarının başarımı çok etkilemeden en aza indirilmesi üzerine bir çalışma daha yapıldı. Bu çalışmada bir genetik algoritma kullanarak alan kullanımı ve başarımı en uygun şekilde birleştirmeyi başardık.Register renaming is a commonly used technique to remove false data dependencies in contemporary superscalar processors. This is done by assigning physical registers to registers defined in architectural design during the decoding process of the instructions in the processor. These assignments are kept in an alias table. When superscalar processors use techniques such as branch prediction the processor may reach a state it should not be in as a result of a misprediction. Instructions fetched mistakenly need to restore the rename assignments and return to a correct state. In contemporary processors the techniques which restore the rename table either sacrifice restore speed or hardware complexity. This study shows an extendable technique which has less hardware complexity, yet can restore the rename table in at most two clock cycles. The design proposes the use of differently sized FIFO queues for each architectural register to hold checkpoints. This study shows that the proposed structure performs better than existing techniques except in a few exceptional cases. Besides the rename table design, a study was also done on determining the optimum FIFO queue size for each architectural register without losing performance. This study proposes the use of genetic algorithms to successfully balance area usage and performance in a reasonable amount of time

    Mower: A new design for non-blocking misprediction recovery

    No full text
    Mower is a micro-architecture technique which targets the branch misprediction penalty in superscalar processors. It speeds-up the misprediction recovery process by dynamically evicting stale instructions and correcting the Register Alias Table (RAT) using explicit control dependency tracking. Tracking control dependencies is accomplished by using simple bit matrices. This low-overhead technique allows overlapping of the recovery process with instruction fetching, renaming and scheduling from the correct path. Our evaluation of the mechanism indicates that it yields performance very close to ideal recovery and provides up to 5% speed-up and 2% reduction in power consumption compared to a recovery mechanism using a reorder buffer and a walker. The simplicity of the mechanism should permit easy implementation of Mower in an actual processor

    LaZy superscalar

    No full text
    LaZy Superscalar is a processor architecture which delays the execution of fetched instructions until their results are needed by other instructions. This approach eliminates dead instructions and provides the necessary means to fuse dependent instructions across multiple control dependencies by explicitly tracking control and data dependencies through a matrix based scheduler. We present this novel redesign of scheduling, recovery and commit mechanisms and evaluate the performance of the proposed architecture. Our simulations using Spec 2006 benchmark suite indicate that LaZy Superscalar can achieve significant speed-ups while providing respectable power savings compared to a conventional superscalar processor
    corecore